11 research outputs found

    BRAHMA(+): A Framework for Resource Scaling of Streaming and ASAP Time-Varying Workflows

    Get PDF
    Automatic scaling of complex software-as-a-service application workflows is one of the most important problems concerning resource management in clouds. In this paper, we study the automatic workflow resource scaling problem for streaming and ASAP workflows, and its time-varying variant where the workflow resource requirements change over time. Service components of streaming workflows execute concurrently while those of ASAP workflows execute sequentially. We propose an intelligent framework, BRAHMA(+), which possesses the capability to learn the workflow behavior and construct a knowledge base that serves as its decision making engine. The proposed resource provisioning algorithms leverage this learned information curated in the knowledge base to perform informed and intelligent scaling decisions. Additionally, BRAHMA(+) employs the use of online-learning strategies to keep the knowledge base up-to-date, thereby accommodating the changes in the workflow resource requirements over time. We evaluate the proposed algorithms using CloudSim simulations. Results on streaming and ASAP workflows, with both static and time-varying resource requirements show that the proposed algorithms are effective and produce good cost-quality trade-offs. The proactive and hybrid algorithms meet the service level agreements and restrict deadline violations to a small fraction (3%-5% in the considered scenarios), while only suffering a marginal increase in average cost per component compared to the described baseline algorithms

    Design and evaluation of automatic workflow scaling algorithms for multi-tenant SaaS

    Get PDF
    Current Cloud software development efforts to come up with novel Software-as-a-Service (SaaS) applications are, just like traditional software development, usually no longer built from scratch. Instead more and more Cloud developers are opting to use multiple existing components and integrate them in their application workflow. Scaling the resulting application up or down, depending on user/tenant load, in order to keep the SLA, no longer becomes an issue of scaling resources for a single service, rather results in a complex problem of scaling all individual service endpoints in the workflow, depending on their monitored runtime behavior. In this paper, we propose and evaluate algorithms through CloudSim for automatic and runtime scaling of such multi-tenant SaaS workflows. Our results on time-varying workloads show that the proposed algorithms are effective and produce the best cost-quality trade-off while keeping Service Level Agreements (SLAs) in line. Empirically, the proactive algorithm with careful parameter tuning always meets the SLAs while only suffering a marginal increase in average cost per service component of approximate to 5-8% over our baseline passive algorithm, which, although provides the least cost, suffers from prolonged violation of service component SLAs

    UnifyDR: A Generic Framework for Unifying Data and Replica Placement

    Get PDF
    The advent of (big) data management applications operating at Cloud scale has led to extensive research on the data placement problem. The key objective of data placement is to obtain a partitioning (possibly allowing for replicas) of a set of data-items into distributed nodes that minimizes the overall network communication cost. Although replication is intrinsic to data placement, it has seldom been studied in combination with the latter. On the contrary, most of the existing solutions treat them as two independent problems, and employ a two-phase approach: (1) data placement, followed by (2) replica placement. We address this by proposing a new paradigm, CDR , with the objective of c ombining d ata and r eplica placement as a single joint optimization problem. Specifically, we study two variants of the CDR problem: (1) CDR-Single , where the objective is to minimize the communication cost alone, and (2) CDR-Multi , which performs a multi-objective optimization to also minimize traffic and storage costs. To unify data and replica placement, we propose a generic framework called UnifyDR , which leverages overlapping correlation clustering to assign a data-item to multiple nodes, thereby facilitating data and replica placement to be performed jointly. We establish the generic nature of UnifyDR by portraying its ability to address the CDR problem in two real-world use-cases, that of join-intensive online analytical processing (OLAP) queries and a location-based online social network (OSN) service. The effectiveness and scalability of UnifyDR are showcased by experiments performed on data generated using the TPC-DS benchmark and a trace of the Gowalla OSN for the OLAP queries and OSN service use-case, respectively. Empirically, the presented approach obtains an improvement of approximately 35% in terms of the evaluated metrics and a speed-up of 8 times in comparison to state-of-the-art techniques.This work was supported by the Agentschap Innoveren & Ondernemen (VLAIO) Strategic Fundamental Research (SBO) under Grant 150038 (DiSSeCt)

    Unifying data and replica placement for data-intensive services in geographically distributed clouds

    Get PDF
    The increased reliance of data management applications on cloud computing technologies has rendered research in identifying solutions to the data placement problem to be of paramount importance. The objective of the classical data placement problem is to optimally partition, while also allowing for replication, the set of data-items into distributed data centers to minimize the overall network communication cost. Despite significant advancement in data placement research, replica placement has seldom been studied in unison with data placement. More specifically, most of the existing solutions employ a two-phase approach: 1) data placement, followed by 2) replication. Replication should however be seen as an integral part of data placement, and should be studied as a joint optimization problem with the latter. In this paper, we propose a unified paradigm of data placement, called CPR, which combines data placement and replication of data-intensive services into geographically distributed clouds as a joint optimization problem. Underneath CPR, lies an overlapping correlation clustering algorithm capable of assigning a data-item to multiple data centers, thereby enabling us to jointly solve data placement and replication. Experiments on a real-world trace-based online social network dataset show that CPR is effective and scalable. Empirically, it is approximate to 35% better in efficacy on the evaluated metrics, while being up to 8 times faster in execution time when compared to state-of-the-art techniques

    BRAHMA : an intelligent framework for automated scaling of streaming and deadline-critical workflows

    Get PDF
    The prevalent use of multi-component, multi-tenant models for building novel Software-as-a-Service (SaaS) applications has resulted in wide-spread research on automatic scaling of the resultant complex application workflows. In this paper, we propose a holistic solution to Automatic Workflow Scaling under the combined presence of Streaming and Deadline-critical workflows, called AWS-SD. To solve the AWS-SD problem, we propose a framework BRAHMA, that learns workflow behavior to build a knowledge-base and leverages this info to perform intelligent automated scaling decisions. We propose and evaluate different resource provisioning algorithms through CloudSim. Our results on time-varying workloads show that the proposed algorithms are effective and produce good cost-quality trade-offs while preventing deadline violations. Empirically, the proposed hybrid algorithm combining learning and monitoring, is able to restrict deadline violations to a small fraction (3-5%), while only suffering a marginal increase in average cost per component of 1-2% over our baseline naive algorithm, which provides the least costly provisioning but suffers from a large number (35-45%) of deadline violations

    Intrusion Detection System Based on Integrated System Calls Graph and Neural Networks

    Get PDF
    Computer security is one of the main challenges of today’s technological infrastructures, whereas intrusion detection systems are one of the most widely used technologies to secure computer systems. The intrusion detection systems use a variety of information sources, one of the most important sources are the applications’ system calls. The intrusion detection systems use many different detection techniques, e.g. system calls sequences, text classification techniques and system calls graphs. However, existing techniques obtain poor results in the detection of complex attack patterns, so it is necessary to improve the detection results. This paper presents an intrusion detection system model that integrates multiple detection techniques into a single system with the goal of modeling the global behavior of the applications. In addition, the paper proposes a new modified system calls graph to integrate and represent the information of the different techniques in a single data structure. The system uses a deep neural network to combine the results of the different detection techniques used in the global model. The result of the study shows the improvement obtained in the detection results with respect to the use of individual techniques, the proposed model achieves higher detection rates and lower false positives. The proposal has been validated onto three datasets with different levels of complexity.This work was supported by the Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital of the Community of Valencia, Spain, within the Program of Support for Research under Project AICO/2020/206

    SpeCH: A scalable framework for data placement of data-intensive services in geo-distributed clouds

    Get PDF
    The advent of big data analytics and cloud computing technologies has resulted in wide-spread research on the data placement problem. Since data-intensive services require access to multiple datasets within each transaction, traditional schemes of uniformly partitioning the data into distributed nodes, as employed by many popular data stores like HDFS or Cassandra, may cause network congestion thereby affecting system throughput. In this article, we propose a scalable and unified framework for data-intensive service data placement into geographically distributed clouds. The proposed framework introduces a new paradigm for partitioning a set of data-items into geo-distributed clouds using Spectral Clustering on Hypergraphs, and is therefore called SpeCH. Scaling spectral methods to large workloads is challenging, since computing the spectra of the hypergraph laplacian is a computationally intensive task. SpeCH provides two solutions to tackle this problem: (1) an algorithm, called SpectralApprox, that leverages randomized techniques for obtaining low-rank approximations of the hypergraph matrix with bounded guarantees, thereby significantly improving the efficiency of spectral clustering while also providing high quality solutions in practice; (2) an algorithm, called SpectralDist, that exploits the highly parallel nature of the spectral clustering algorithm and uses Apache Spark to speed-up the process while retaining the same quality guarantees as the exact algorithm. Additionally, being distributed in nature, SpectralDist enables SpeCH to perform data placement on workloads that require resources beyond the capacity of a single machine. Experiments on a real-world trace-based online social network dataset show that the SpeCH is effective, efficient, and scalable. Empirically, SpectralApprox is comparable in efficacy on the evaluated metrics, while being up to 10 times faster in execution time when compared to state-of-the-art techniques. On the other hand, though SpectralApprox is 7–8 times faster when compared to SpectralDist, in terms of efficacy on the evaluated metrics the latter is up to 50% better.This research is partly funded by VLAIO, under grant number 140055 (SBO Decomads)

    Efficient Indexing and Querying in XML Databases

    No full text
    Optimizing XML queries is an intensively studied problem in the field of databases of late. The topic has a host of applications, viz., web-scale XML and keyword search. In this paper, we address the problem of efficient execution of XML path queries (commonly known as XPath queries), branch queries and wild-card queries. Our index structure assists in fast identification of child-parent as well as ancestor-descendant relationship, thus increasing the efficiency of XPath query execution. Both XML data and queries possess an inherent tree structure and, thus, fast child-parent lookup is a necessity to improve performance. We propose a holistic hybrid index structure that combines the Extended Dewey labeling scheme with the CTree index structure to leverage advantages of both the mechanisms. Our index structure is capable of catering to all the queries (single path, branch and wild-card queries), with equal or better performance metrics when compared to the state-of-the-art
    corecore